Current Issue : July-September Volume : 2024 Issue Number : 3 Articles : 5 Articles
Background: The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the casespecific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool. Results: Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool’s intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians. Conclusion: The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https:// proje ct123 vcf. sourc eforge. io, it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine....
Background: Copy number alterations (CNAs) are genetic changes commonly found in cancer that involve different regions of the genome and impact cancer progression by affecting gene expression and genomic stability. Computational techniques can analyze copy number data obtained from high-throughput sequencing platforms, and various tools visualize and analyze CNAs in cancer genomes, providing insights into genetic mechanisms driving cancer development and progression. However, tools for visualizing copy number data in cancer research have some limitations. In fact, they can be complex to use and require expertise in bioinformatics or computational biology. While copy number data analysis and visualization provide insights into cancer biology, interpreting results can be challenging, and there may be multiple explanations for observed patterns of copy number alterations. Results: We created Control-FREEC Viewer, a tool that facilitates effective visualization and exploration of copy number data. With Control-FREEC Viewer, experimental data can be easily loaded by the user. After choosing the reference genome, copy number data are displayed in whole genome or single chromosome view. Gain or loss on a specific gene can be found and visualized on each chromosome. Analysis parameters for subsequent sessions can be stored and images can be exported in raster and vector formats. Conclusions: Control-FREEC Viewer enables users to import and visualize data analyzed by the Control-FREEC tool, as well as by other tools sharing a similar tabular output, providing a comprehensive and intuitive graphical user interface for data visualization....
Background: Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. Results: We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. Conclusions: By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https:// github. com/ styvon/ PRSbi ls....
Background: Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism’s metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learningbased approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. Results: In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoostbased metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on singleorganism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. Conclusions: The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning....
Background: Gene expression may be regulated by the DNA methylation of regulatory elements in cis, distal, and trans regions. One method to evaluate the relationship between DNA methylation and gene expression is the mapping of expression quantitative trait methylation (eQTM) loci (also called expression associated CpG loci, eCpG). However, no open-source tools are available to provide eQTM mapping. In addition, eQTM mapping can involve a large number of comparisons which may prevent the analyses due to limitations of computational resources. Here, we describe TorcheCpG, an open-source tool to perform eQTM mapping that includes an optimized implementation that can use the graphical processing unit (GPU) to reduce runtime. Results: We demonstrate the analyses using the tool are reproducible, up to 18 × faster using the GPU, and scale linearly with increasing methylation loci. Conclusions: Torch-eCpG is a fast, reliable, and scalable tool to perform eQTM mapping. Source code for Torch-eCpG is available at https:// github. com/ kordk/ torch- ecpg....
Loading....